-
Notifications
You must be signed in to change notification settings - Fork 822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eks-prow-build-cluster: Use dedicated Managed Node Groups (MNGs) per Availability Zone (AZ) #6320
Conversation
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: upodroid, xmudrii The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cancel the hold when you are ready to merge |
Let's get this merged and I'll create a new PR for follow-ups |
|
||
cluster_version = var.node_group_version_us_east_2a | ||
|
||
taints = var.node_taints_build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - you can use eks_managed_node_group_defaults
to set the default values you wish to use across all of the managed nodegroups, and then any specific settings unique to that nodegroup can be set in the nodegroup definition. You can also set the defaults and still override the value in the nodegroup definition as well
So roughly speaking, something like the following might help cut down on the copy+pasta across nodegroup definitions:
eks_managed_node_group_defaults = {
use_name_prefix = true
taints = var.node_taints_build
labels = var.node_labels_build
... anything else that you want common across the nodegroups
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's neat, thanks for pointing it out! I'll keep it in mind and see if we can make use of it the next time we do some rollout
xref #6303 (see for more details and context)
As per recommendations received from the AWS support and @tzneal, we're replacing blue and green node groups with node groups per AZ. In other words, old node groups had instances in all three AZs. Now, we have a dedicated node group for each AZ. This is a short-term solution to fix the stability issues that we're facing.
This has been successfully rolled out to canary, I'll do prod rollout once this PR is reviewed and merged.
Notes:
count
because that would make rolling out upgrades too complicatedFollow-ups:
/assign @upodroid @ameukam @dims